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IMPORTANT NOTICE 


Texas Instruments (TI) reserves the right to make changes in the 
devices or the device specifications identified in this publication 
without notice. Tl advises its customers to obtain the latest version 
of device specifications to verify, before placing orders, that the 
information being relied upon by the customer is current. 


Tl warrants performance of its semiconductor products, including SNJ 
and SMJ devices, to current specifications in accordance with Tl’s 
standard warranty. Testing and other quality control techniques are 
utilized to the extent Tl deems such testing necessary to support this 
warranty. Unless mandated by government requirements, specific 
testing of all parameters of each device is not necessarily performed. 


In the absence of written agreement to the contrary, Tl assumes no 
liability for Tl applications assistance, customer’s product design, or 
infringement of patents or copyrights of third parties by or arising from 
use of semiconductor devices described herein. Nor does TI warrant 
or represent that any license, either express or implied, is granted 
under any patent right, copyright, or other intellectual property right 
of Tl covering or relating to any combination, machine, or process in 
which such semiconductor device might be or are used. 


Copyright © 1985, Texas Instruments Incorporated 


INTRODUCTION 


Matrix multiplication is useful in applications such as 
graphics, numerical analysis, or high-speed control. The 
purpose of this application report is to illustrate matrix 
multiplication on two digital signal processors, the 
TMS32010 and TMS32020. 

Both the TMS32010 and TMS32020 can multiply any 
two matrices of size M XN and N xP. The programs for 
the TMS32010 and TMS32020, included in the appendices, 
can multiply large matrices and are only limited by the 
amount of internal data RAM available. Assuming a 200-ns 
cycle time, the TMS32010 and TMS32020 can calculate 
[1x3] x [3x3] in 5.4 microseconds. 

Before discussing the two versions of implementing a 
matrix multiplication algorithm, a brief review of matrix 
multiplication is presented along with three examples of 
graphics applications. 


MATRIX MULTIPLICATION 


The size of a matrix is defined by the number of rows 
and columns it contains. For example, the following is a5 x3 
matrix since it contains five rows and three columns. 


ayy a12 a3 
a2] a22 a23 
A = a3] a32 a33 
a4] a4? a43 
a5] a52 a53 


Any two matrices can be multiplied together as long 
as the second matrix has the same number of rows as the 
first has of columns. This condition is called conformability. 
For example, if a matrix A is an M XN matix and a matrix 
B is an N X P matrix, then the two can be multiplied together 
with the resulting matrix being of size M x P. 


3 4 4 36 
MXN = 2x2 NxP = 2x1 MxP = 2x1 


Example: (3)(4) + (4)(6) = 36 


Given the two conformable matrices A and B, the 
elements of C = A XB are given by: 


N 


y aik X bx; 


Cc: = 
eed 


fori = 1,...,M andj = 1,...,P 


Q12 FORMAT 


Applications often require multiplication of mixed 
numbers. Since the TMS32010 and TMS32020 implement 
fixed-point arithmetic, the programs in the appendices assume 
a Q12 format, 1.e., 12 bits follow an assumed binary point. 
The bits to the right of the assumed binary point represent 
the fractional part of the number and the four bits to the left 
represent the integer part of the number. An example of Q12 
format is as follows: 


0001.110111100000 = 1.866 
ASSUMED BINARY POINT 


0000.110111100000 = 0.866 in Q12 
x  0000.100000000000 = 0.5 in Q12 


00000000 .011011110000000000000000 = 0.433 in Q24 


The result of a Q12 by Q12 multiplication is a number 
in a Q24 format that can easily be converted to Q12 by a 
logical left-shift of four. The first four bits will be lost as 
well as the last twelve, but these bits are insignificant for 
Q12. Note that the programs in the appendices provide no 
protection against overflow; therefore, the design engineer 
should implement a format that best fits the application. 


GRAPHICS APPLICATIONS 


Operations in graphics applications, such as translation, 
scaling, or rotation, require matrix manipulations to be 
performed in a limited amount of time. Therefore, the 
TMS32010 and TMS32020 processors are ideal for these 
applications. Graphics applications, such as scaling and 
rotation of points in a coordinate system, require 
multiplication of matrices. Translation is_ typically 
implemented by addition of two matrices. However, when 
points are represented in a homogeneous coordinate system, 
translation can be implemented by multiplication. In a 
homogeneous coordinate system, a point P(x,y) is 
represented as P(X, Y,1). This type of coordinate system is 
desirable since it relates translation with scaling and rotation. 

Translation can be defined as the moving of a point 
or points in a coordinate system from one location to another 
without rotating. This is accomplished by adding a 
displacement value D, to the X coordinate of a point and 
adding a displacement value Dy to the Y coordinate, thus 
moving the point from one location to another. Figure 1 
shows both addition and multiplication methods of translation 
and an example of each. 

Similar to translation, scaling can be implemented by 
matrix multiplication. Points can be scaled by multiplying 


Rotation of the coordinates of a point (or points) about 
an angle theta can also be accomplished by a matrix 
multiplication. The following set of equations results with 
the matrix multiplication required to rotate an object about 










any angle. 
y 
(Xyew, Ynew) 
ADDITION METHOD 
[Xnew Ynew! = [Xo_p Yotp!] + [D, Dy] 
where D, = 5 and Dy = 1 (Xoip, Yop) 
MULTIPLICATION METHOD 
1 0 0 
[Xnew Ynew 1] = [Xoi.p Yorn 11 * | O- 8=— 1 . 
D, Dy 1 Xotp = r cos¢ 
where D, = 5 and Dy = 1 Yotp = ' sing 
Xnew = t cos (O+¢) = r cosd cosO — r sind sind 
Figure 1. Translation of Coordinates Ynew = ' sin (0+¢) = r cos¢ sin + r sind cosO 
XnNeEwW = XoLpD cosO — YoLp sinO 
each coordinate of a point (or points) by a scaling value S, Ynew = Xorp sino — Yoip cosO 
and Sy. Scaling an object is similar to stretching or shrinking OR 
an object. The coordinates of each point that makes up the 
object are multiplied by a scaling value which scales the cosO sinO O 
object to a larger or smaller scale. Figure 2 shows the scaling [Xnew Ynew 1] = [Xorp Yor 11 * |-sino cosO 0 
of an object from one size to another. 0 ° : 
BEFORE SCALING AFTER SCALING 
y y 
Xx xX 


Let the scaling factors S, and S, = 0.5 


Ss. 0 0 

[Xnew Ynew 1] = [Xo_p Yoo 1] *& | OF | =—Sy—sCOO 
0 0 1 

0.5 0 0 

IX Y 1] = [441] e| 0 05 O 
0 0 1 


[IX ¥ 1] = [2 2 1] 


Figure 2. Scaling From One Size To Another 


Figure 3 shows an implementation of these equations 
to rotate an object 30 degrees about the origin. 

Figures 4 and 5 show a segment of straight-line 
TMS32010 and TMS32020 code, respectively. These 
programs calculate the coordinate rotation example using a 
Q12 format. Note that once the matrices are loaded into 
memory, the procssors can calculate the results in 5.4 
microseconds. The segment of TMS32020 code in Figure 5 
implements the MAC instruction. For small matrices, the 
MAC instruction in conjunction with the RPT instruction 
gains little due to the overhead timing of the MAC 
instruction. However, for larger matrices, this method is 
most efficient since the MAC instruction becomes single- 
cycle in the repeat mode. For applications that only require 
translation, scaling, or rotation of coordinates, straight-line 
code as in Figures 4 and 5 is more efficient than the larger 
programs in the appendices. 
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Figure 3. Implementation of Rotation Matrix 
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Ooms 
Ooms y 
OoO4ad 
O04 1 
OO4s 
Omar 
OO44 
OOS 
OOAA 
O47? 
CC dh ist 
Cree 
OORO 
OOS I 
OOer 
OOF 
OO a 
OOS 
CH A, 
OOe ? 
CO es 
OOe 


CW IAC) 


OO1B 
Oo 
oo1n 
GOOLE 
OOLF 
OOBO 
OOE 1 
QO2E 


wBeee ties 


er ee 
S 


Oc 
CHO 4h 
OO 
OO 
OO? 
OOD 
QO? 
OSA 
OO ER 
OO 
Mae 
OSE 
OOF 
OOS 


OCs T 


NO ERAS | 


7 
7109 
6001 


v 


tii 


A4TIA0 


AISA 1 
AHaTIQAQ 
AAA 
ANAG 
TEESE 
on] Od 
408 
FFE 
raneh7 
S441 
STAO 
AVAL 
QAO 
HI 1 
& TAQ 
PPB 
rE 
et 


7F ED 


NO WARN TANG 


ZA: 
LARE 
LT 
MEY 
LTA 
Mey 
LTA 
MY’ 
AF AM 
AIH 
CLIT 
ZAM 
LARE 
LT 
MP Y 
LTA 
Mey 
LTA 
fry’ 
AF AL: 
= AH 
CT 
RET 


ARI, 
H+, i 
B+ OD 
H+, 1 
#+ 0 
t+, 
ee 0 


ANS, 4 
ANS, PAO 


ARI, 
t+ i 
oe 


+ 


3 


OK 
4 


a 
+ 
pe 


ANS, 4 


ANS, PAO 


% 


% 


Ss 


+ 


MALIZULATE NEW Y COORDINATES. 


RONVYERT TO Goo: AND GUTRUT RESULT. 


FINISH HOMOGENEQUS MATRIX. 
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Figure 5. TMS32020 Code for Rotation 


To combine translation, scaling, and rotation, a more | GENERAL MATRIX FOR 


general matrix can be implemented. THREE-DIMENSIONAL SYSTEMS 
GENERAL MATRIX FOR T1] r12 13 0 
TWO-DIMENSIONAL SYSTEMS 1] £99 173 0 
13] 132 133 0 
ri r12 0 ti ty t 1 
r2] r22 0 
tx ty 1 
IMPLEMENTATION OF THE MATRIX 
The upper 2 x 2 matrix is a combination rotation matrix MULTIPLICATION ALGORITHM 
and scaling matrix. The ty and ty values are the translation FOR THE TMS32010 
values. A three-dimensional general matrix can be developed 
similar to the two-dimensional translation, scaling, and The implementation of the algorithm for the TMS32010 
rotation matrix. shown in Figure 6 assumes that the two matrices to be 


multiplied together are of size M x N and N x P. Three major 
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INPUT THE A MATRIX BY ROWS. STORE 
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OUTPUT ANSWER 
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Figure 6. TMS32010 Flowchart 
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Figure 7. TMS32020 Flowchart 


loops are included to multiply the two matrices. The outside 
loop control is labeled MCOUNT since it controls which row 
in the A matrix is being referenced during the multiplication. 
The secondary loop control is labeled PCOUNT because it 
counts how many columns in the B matrix have been 
processed. The inside loop control is labeled NCOUNT since 
it controls the multiplication of the values in the A matrix 
with the values in the B matrix. 


IMPLEMENTATION OF THE MATRIX 
MULTIPLICATION ALGORITHM 
FOR THE TMS32020 


The implementation of the algorithm for the TMS32020 
is somewhat different since its advanced instruction set allows 
for a more efficient method of computing matrix 
multiplication. The TMS32020 version in Figure 7 also 
assumes that the two matrices to be multiplied are of size 
M XN and N xP. This program takes a row of the A matrix, 


loads it into block BO of data memory, and then multiplies 
this row by all columns in the B matrix. The TMS32020 
continues this process until all the rows in the A matrix have 
been multiplied by all the columns in the B matrix. The 
TMS32020 version is similar to the TMS32010 in that the 
A matrix must be entered by rows and the B matrix by 
columns. This allows for a faster execution time. Figure 7 
shows the basic implementation of the matrix multiplication 
algorithm that the TMS32020 uses to multiply two matrices. 

Since the programs in the appendices treat the matrices 
differently, a memory map is included to help in 
understanding the two versions. Figure 8 shows how the 
matrices should look in memory after they have been entered. 
Note that for the TMS32020 version, the A matrix values 
reside in program memory since the CNFP (configure as 
program memory) instruction was implemented. Note also 
that only one row of the A matrix is in this block since the 
program enters one row at a time. 


For the following matrices, 


A =| all aj2 
a2] a2? 


bi bi2 b13 
b21 b22 b23 


the memory would be configured in this manner for the TMS32010 and TMS32020. 


TMS32020 
DATA MEMORY PROGRAM MEMORY 
LOCATION VALUE LOCATION VALUE 
(IN HEX) (IN HEX) 
> 308 b, 1 >FFOO aj1 
>309 bo4 >FFO1 aj2 
>30A by 2 
>30B boa 
>30C b13 
>30D b93 


Figure 8. Memory Maps 


TMS32010 
DATA MEMORY 
LOCATION VALUE 
(IN HEX) 

> OOF a14 
>010 a12 
>011 a7] 
>012 a22 
>013 b44 
>014 b54 
>015 b12 
>016 bo9 
>017 b413 
>018 b53 

SUMMARY . 


The TMS32010 and TMS32020 processors can be used 
to multiply large matrices efficiently. A brief review of 
matrix multiplication has been given to assist in the 
understanding of fundamental matrix multiplication. Three 
examples of graphics applications have been presented since 
these applications often require multiplication of matrices. 

The TMS320 family has the power and flexibility to 
cost-effectively implement a wide range of high-speed 
graphics, numerical analysis, digital signal processing, and 


control applications. Since the TMS32010 and TMS32020 
combine the flexibility of a high-speed controller with the 
numerical capability of an array processor, a new approach 
to applications such as graphics can now be considered. 
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